You are viewing the RapidMiner Studio documentation for version 10.1 - Check here for latest version
Process Documents from Mail Store
(Text Processing)
Synopsis
Generates word vectors from a text collection stored in an IMAP or POP3 mail server.Input
word list
The word list port.
connection (Connection)
This port can take a connection of type Mail (retrieve).
Output
example set (Data Table)
The example set port.
word list
The word list port.
connection (Connection)
If the input port connection has data, it will be put through to this output port.
Parameters
- mail_account The mail connection to use to retrieve the email. Only visible if the connection input port is not connected and the compatibility level is above 9.3.1. Range:
- create_word_vectorIf checked, the tokens of a document will be used to generate a vector numerically representing the document. Range:
- vector_creationSelect the schema for creating the word vector. Range:
- add_meta_informationIf checked, available meta information of the text like filename, date is added as attribute. Range:
- keep_textIf checked, the input text will be stored as a special String attribute with the role text. Range:
- prune_methodSpecifies if to frequent or to infrequent words should be ignored for word list building and how the frequencies are specified. Range:
- prune_below_percentIgnore words that appear in less than this percentage of all documents. Range:
- prune_above_percentIgnore words that appear in more than this percentage of all documents. Range:
- prune_below_absoluteIgnore words that appear in less than that many documents. Range:
- prune_above_absoluteIgnore words that appear in more than that many documents. Range:
- prune_below_rankWords are ordered by frequency and words with a frequency less than the frequency of the rank given by this percentage will be pruned. Range:
- prune_above_rankWords are ordered by frequency and words with a frequency higher than the frequency of the rank given by this percentage will be pruned. Range:
- datamanagementDetermines, how the data is represented internally. Range:
- define_storeMail store connection can be defined by using either a session bound to a JNDI name, or explicitly by specifying host and user. Range:
- jndi_nameJNDI name referencing a mail session. Range:
- hostIMAP or POP3 host name Range:
- userIMAP or POP3 user name Range:
- passwordIMAP or POP3 password Range:
- connection_propertiesAdditional properties for the mail store. Range:
- protocolIMAP or POP3 Range:
- only_unseenIf checked, only new unseen messages will be processed. Range:
- mark_seenIf checked, all processed messages will be marked read. Only works with IMAP, not with POP3. Range:
- delete_messagesIf checked, all processed messages will be deleted. Especially useful for POP3 Range:
- recursiveRecurse into subfolders? Range:
- folderName of the IMAP folder to scan. Must be INBOX for POP3. Range:
- download attachmentsselect to download mails and attachments Range:
- attachment file-patternA pattern for the attachment you want to select. Usual wildcards like ? and * are supported. Range:
- attachment MIME-typetype in the MIME-type you want to select.(if this label and all additional labels are empty all MIME-types are selected) Range:
- parallelize_vector_creationDetermines whether the execution of Vector Creation should be parallelized. Range: